Time and event-specific deep learning for personalized risk assessment after cardiac perfusion imaging

Standard clinical interpretation of myocardial perfusion imaging (MPI) has proven prognostic value for predicting major adverse cardiovascular events (MACE). However, personalizing predictions to a specific event type and time interval is more challenging. We demonstrate an explainable deep learning model that predicts the time-specific risk separately for all-cause death, acute coronary syndrome (ACS), and revascularization directly from MPI and 15 clinical features. We train and test the model internally using 10-fold hold-out cross-validation (n = 20,418) and externally validate it in three separate sites (n = 13,988) with MACE follow-ups for a median of 3.1 years (interquartile range [IQR]: 1.6, 3.6). We evaluate the model using the cumulative dynamic area under receiver operating curve (cAUC). The best model performance in the external cohort is observed for short-term prediction – in the first six months after the scan, mean cAUC for ACS and all-cause death reaches 0.76 (95% confidence interval [CI]: 0.75, 0.77) and 0.78 (95% CI: 0.78, 0.79), respectively. The model outperforms conventional perfusion abnormality measures at all time points for the prediction of death in both internal and external validations, with improvement increasing gradually over time. Individualized patient explanations are visualized using waterfall plots, which highlight the contribution degree and direction for each feature. This approach allows the derivation of individual event probability as a function of time as well as patient- and event-specific risk explanations that may help draw attention to modifiable risk factors. Such a method could help present post-scan risk assessments to the patient and foster shared decision-making.


INTRODUCTION
Through recent advances, artificial intelligence (AI) has established an important new paradigm in medical image analysis, potentially enhancing prognostic applications from all cardiovascular imaging modalities. However, there are challenges in how risks are conveyed to both physicians and patients to facilitate the best and most appropriate preventative strategies. A single metric of all risks combined is perhaps less useful than a diverse map of individual risks, their timeline, and influencing factors 1 .
Myocardial perfusion imaging (MPI) is a well-established technique for diagnosing coronary artery disease. Although the primary purpose of MPI is the assessment of the flow-limiting coronary artery disease, it is often used for risk stratification 2 . Prognostic risk assessment has been based on the distribution and burden of ischemia detected, usually combined with a composite score of clinical risk factors or inferred from an expert clinical impression-while this has proven to be a generally successful model from a statistical viewpoint, it is arguably relatively crude at the individual patient level. In the status quo, although a patient may be informed that they are at high risk for an adverse event, they are left with less information about what type of event, or within what timeframe can be anticipated, and this can be more unnerving than productive. While researchers have successfully used AI to facilitate, quantify, and automate several aspects of the conventional imaging workflow for diagnosing of disease 3 , the efforts applied to prognostic interpretation lack time-specific or event-specific prediction 4 . To date, proposed predictive AI models for cardiovascular image interpretation do not differentiate between the possible adverse events and oversimplify the predicted risk to a single numeric value despite the richness and depth of the source data 5 . Compounding this issue is the significant heterogeneity of the definitions of composite endpoints, such as major adverse cardiovascular events (MACE) in prior clinical studies 6 .
To date, no methods are established to predict time-dependent risks of specific event types (such as death or myocardial infarction) from a single model after cardiovascular imaging. In this study, we aimed to create a deep learning model capable of predicting patient and event-specific risk over time directly from combined cardiac perfusion image and clinical data. We also describe methods for visual explanation of these predicted risks over time which can be presented to physicians and patients. This could be applied to patient care by presenting patientindividualized survival curves for specific events and explaining the contribution of risk factors to event risk, potentially leading to patient engagement and tailoring therapy for the prevention of adverse events. The overview of the study is presented in Fig. 1.

Population characteristics
The training and internal testing cohort included 20,401 patients followed up for MACE for a median of 4.4 years (interquartile range [IQR]: 3.4, 5.7). All-cause death was observed in 1,396 patients (6.8%) and the median time to death was 2.3 years (IQR: 1.1, 3.7). ACS was observed in 657 patients (3.2%) and the median time to that event was 1.6 years (IQR: 0.6, 3.0). Revascularization was observed in 1,485 cases (7.3%) and the median time to revascularization was 0.6 years (0.1, 2.3). Summary of the clinical characteristics of the derivation cohort is shown in Table 1.
The external testing set included 13,988 patients followed up for MACE for a median of 3.1 years (IQR: 1.6, 3.6). All-cause death was observed in 683 patients (5%) and occurred after a median of 1.5 (IQR: 0.6, 2.5) years from the scan. Acute coronary syndrome (ACS) was observed in 361 (2.5%) of patients after a median of 1.3 years from scan (IQR: 0.5, 2.3) and 918 patients (6.6%) underwent revascularization after a median of 0.1 years from baseline imaging (IQR: 0.03, 1.3). Summary of the clinical characteristics of the external cohort is shown in Table 2.

Internal testing
We present the cumulative dynamic area under receiver operating curve (cAUC) for the prediction of any event as well as each of the separate events in Fig. 2. The best performance for the prediction of ACS and all-cause death was observed for short-term prediction in the first six months after scan, mean cAUC for ACS and allcause death reached 0.78 (95% confidence interval [CI]: 0.77, 0.79) and 0.86 (95% CI: 0.85, 0.87), respectively. For revascularization, the initially high cAUC declined after the first year but achieved its peak values in long-term observationmean cAUC in the fifth year of follow-up was 0.84 (95% CI: 0.82, 0.85). While the perfusion abnormality measure maintains high cAUC over a short-term observation for the prediction of ACS, revascularization or any N=20, 401 (5 sites) EXTERNAL Fig. 1 Deep learning enabled time-to-event outcome prediction after cardiac imaging -study overview. a A time-to-event deep learning model was trained (left) using data from the 5 sites of the REFINE SPECT registry (n = 20,401), then tested internally in a 10-fold crossvalidation regimen (middle) and tested in 3 external sites (n = 13,988) (right); b The time-to-event model uses 5 SPECT polar maps and 15 clinical features as inputs (left) and predicts time-dependent probability of death (orange line), ACS (green line), and revascularization (blue line); c The performance of the model (left) is analyzed using cumulative dynamic area under the receiver-operating curves (cAUC). Red line represents the time-to-event model and blue line represents perfusion abnormality. The explanation of the prediction is visualized as a waterfall plot with blue arrows representing features that decrease the risk and red arrows representing the features that increase the risk (right); ACS acute coronary syndrome, AUC area under the receiver operating characteristics curve, TPD total perfusion deficit, PCI percutaneous coronary intervention, CI confidence intervals. event, the cAUC decreases over time faster than of our model. The model has superior performance to the perfusion abnormality for death prediction at all time points for cAUC as shown in       Table 3. Receiver-operating curve (ROC) plots for the prediction of death, ACS, and revascularization at 1, 3 and 5 years in the external testing sets using time-to-event model are shown on Supplementary Fig. 1. Sensitivity analysis showed no significant effect on the prediction of other events or composite MACE outcome, but it decreased performance for prediction of revascularization when revascularization events within 180 days from the scan are removed (Supplementary Table 2). The time-to-event model also outperformed the multivariable Cox regression model in the prediction of all types of events (Supplementary Table 3).

Individual prediction and explanation
Examples of individualized predictions for four patients who experienced different types of outcomes in the follow-up period are shown in Fig. 4. The prediction of our model is presented as three cumulative incidence functionsseparately for each type of event. The individual prediction plot is accompanied by a waterfall plot (Fig. 5) providing an explanation of the highest predicted risk that highlights how the polar maps and clinical features contribute to the overall risk. The waterfall plot allows for visualization of both the extent of influence (length of the arrow) and direction (increasing risk of the eventred arrow pointing to the right, decreasing riskblue arrow pointing to the left). In the presented case, a 41-year-old female with heart failure with moderately reduced ejection fraction and moderate perfusion deficits is identified as having a high risk of death. Explanation of the prediction shows the elevated resting heart rate as one of the factors having the greatest contribution to the elevated risk. Simulating the reduction of the resting heart rate to 70/min shows that optimal guideline-guided management could lower average predicted risk of death by 36%. Inference using our model took below 12 milliseconds per patient case on an Apple MacBook Pro laptop computer.

DISCUSSION
Leveraging a large cardiac imaging registry, we developed a deep learning approach for individual risk computation that allows time-dependent and event-specific predictions jointly from clinical and cardiac imaging data. We obtain time-and event-specific risk estimation and provide visually intuitive graphs for individual risk explanations. The model provides risk estimates over-time, for allcause death, ACS, and late revascularization separately, with easyto-understand patient-level explanations. We evaluated our model in a large, multi-site external dataset as well as with internal 10fold cross-validation. Good performance in the external testing set points to the ability of our model to generalize to unseen real-life data from new centers. The model relies on the combined predictive potential of the clinical features, stress test data, and direct image analysis, similarly to the way clinicians try to integrate all available information to provide the most accurate study interpretation. Moreover, this approach also leverages time-toevent data to provide more robust risk estimation over time, which could potentially be applied to a broad range of AI tasks.
Previous prognostic studies estimated risk jointly using composite adverse events and without the use of time-to-event data 5 . A high-risk of death in the next year is a very different scenario than a risk of hospital admission or revascularization over 10 yearsbut our current presentation and assessment of data lack this granularity. Recently, non-linear AI survival models have demonstrated practical implementations in healthcare 7-9 with state-of-the-art performances that are comparable to or improve the performance of traditional Cox proportional hazard models 10 . Examples of such models include precision genomic prognostication in patients receiving cancer treatment 11 , prediction of oral cancer survival 12 as well as of progression of potentially malignant disorders to cancer 13 . A large scale, multisite study investigated the use of deep neural networks trained using full electronic health records data in prediction multiple medical events 14 . This is however, to our knowledge, the first study to evaluate prediction at multiple time points of multiple events in a large multi-site registry of cardiovascular imaging data that also explicitly takes advantage of time-to-event data during model training. Patient level-explanation may be crucial for the clinical adoption of AI in medical imaging 15,16 , but this approach was not previously applied as a joint explanation of direct imaging data and clinical variables for individualized risks of specific events. Such explanations may point to abnormalities in the imaging data as well as to some clinical features that drive the increased risk for a given adverse event, potentially allowing for a more comprehensive assessment of patient's condition. Event-specific predictions within a single model can be presented at the time of review of imaging and may enable physicians to practice precision medicine, with individually tailored treatments and preventive measures. For instance, a prediction of high risk of all-cause death could encourage more frequent follow-up visits and additional diagnostic tests, while a high risk of ACS and revascularization could indicate that the patient is a candidate for revascularization or needs intensification of medical therapy.
In addition to informing the physician about the rationale behind model predictions, the visualization of factors contributing to increased risk of adverse events might serve as a powerful tool in shared decision-making after the exam, utilizing all available information 17 . When discussed with the patient, a special focus might be given to modifiable risk factors such as high BMI 18 , hypertension 19 , diabetes, and dyslipidemia 20 , leading to optimal, goal-directed medical therapy of these risk factors. That could be a starting point for a discussion on how these factors can be targeted through lifestyle modifications and medications. Such an approach could be an important step towards patient empowerment and could improve adherence to physicians' recommendations. However, it is important to acknowledge the limitations of SHapley Additive exPlanations (SHAP) -derived feature importance 21 , especially that they do not imply casual relations between the input features and the outcome. For this reason, the waterfall plots (Fig. 5) that were generated based on SHAP values should be considered an illustrative tool and should be interpreted with caution.
Interestingly, in the external testing set, we found that the perfusion abnormality variable had lower performance than the clinical-only time-to-event model in predicting all-cause mortality at any time point. This confirms that clinical features, such as age and medical comorbidities, are important determinants of allcause mortality and have previously been shown to influence the "warranty period" of normal perfusion on MPI 22 . Additionally, myocardial perfusion may change in response to anti-anginal therapies and thus would not be expected to be an accurate predictor of long-term hard outcomes 23 . However, unsurprisingly the revascularization prediction performance was similar for the perfusion abnormality and the full time-to-event model and higher than for the clinical-only time-to-event model. This is expected because physicians may rely on perfusion information when making revascularization decisions 24,25 , which could lead to overestimation of its prediction performance for the revascularization.
Our study has several limitations. First, we have only assessed all-cause mortality and could, therefore, not differentiate between cardiac and non-cardiac deaths. We separately considered the major cardiovascular events of ACS and revascularization. However, other possible events like atrial fibrillation, worsening of heart failure, or sub-classification of ACS (presence of ST-elevation) were not available for analysis due to the multi-site, retrospective nature of the imaging registry. In our model, the risk for death and ACS is estimated independently from the risk of revascularization, thus allowing for the event-specific assessment of patient's prognosis, but it should be noted that an increased risk of ACS and revascularization would lead physicians to consider the same preventive strategies. Furthermore, while the dynamic cAUC of the standard quantitative perfusion analysis decreases over time for the revascularization prediction, our model maintains higher cAUC for the revascularization prediction in the long term. The performance of our model could be further improved by utilizing data from other imaging modalities 26 . For instance, computed tomography attenuation correction scans could be used to automatically calculate calcium score, which could be included in the time-to event model 27 . Finally, the usefulness of the timeto-event predictions has not been evaluated in prospective studies. This is understandable given the novelty of the proposed methods. Further investigation is needed to assess if the additional temporal dimension of the model's prediction and its ability to differentiate the risk of specific events can improve physicians' workflows and lead to better clinical decisions.
The proposed deep learning model, using cardiac perfusion images and clinical data with time-to-event specific outcomes, provides a robust prediction of the risk of all-cause death, ACS, and revascularization. The model significantly improved the prediction of all-cause death and the composite MACE outcome, while also improving the prediction of ACS in the external testing population. By presenting the individualized patient-specific postscan risk assessment over time in an intuitive manner for the clinicians and patients, our approach can potentially help better address patient risk and guide management that is tailored to the patient's individual risk profile.

Patient populations
For the training and internal validation, we included 20,418 scans from five international centers participating in the prospective, multi-site Registry of Fast Myocardial Perfusion Imaging with Next generation SPECT (REFINE SPECT) 28 . We included all consecutive patients who underwent clinically indicated SPECT MPI from 2009 to 2014. We excluded 17 patients without gated studies, leaving a total of 20,401 patients.

Definition of events
Patients were followed for MACE, which was defined as allcause death, myocardial infarction, unstable angina, and revascularization (surgical or percutaneous). Non-fatal myocardial infarction was defined as hospitalization for cardiac chest pain or anginal equivalent with positive cardiac biomarkers 29 . Unstable angina was defined as recent onset or escalating cardiac chest pain with negative cardiac biomarkers. All outcomes were adjudicated by experienced cardiologists after considering all available clinical data. We chose three outcomes as events of interest: death, acute coronary syndrome (ACS)defined as either non-fatal myocardial infarction or admission for unstable angina, and revascularization (with percutaneous coronary intervention or coronary artery bypass grafting). For each patient, only the first occurring event was considered and therefore, each patient had either one of the three events or no events. If a patient presented with ACS and had revascularization on the same day, that event was considered as an ACS. If a patient had either ACS or revascularization and died on the same day, that event was considered as a death. For area under receiver-operating curve (AUC) analysis, events that occurred up to a given time-point were considered as positive events, Red arrows pointing to the right indicate that the feature increases the risk of death, while blue arrows pointing to the left indicate that the feature decreases the risk. Patient features that point to potential actionable interventions through medication or lifestyle modification are highlighted using bold font. c Simulated modification of risk factors through reduction of resting heart rate to 70/min reduces the average risk of death by 36%. The predicted probabilities of death, ACS and revascularization before the intervention is shown as orange, green and blue lines, respectively. The predicted probabilities of death, ACS and revascularization after heart rate reduction are shown as orange, green and blue dashed lines, respectively. ACS acute coronary syndrome, CAD coronary artery disease, LVEF left ventricular ejection fraction, BMI body mass index, PCI percutaneous coronary intervention, CABG coronary artery bypass grafting. and if the event occurred after the specified time point, the patient was considered event-free.

External cohort
The external testing population included an additional 13,988 patients who underwent clinically indicated SPECT MPI with MACE follow-up at a separate three external centers: Oklahoma Heart Hospital (n = 6034), University of Calgary Hospital (n = 2985) and Yale New Haven Hospital (n = 4969). All outcomes were adjudicated using the same criteria as in the training cohort.

Image collection
Patients were imaged with either a DSPECT (Spectrum-Dynamics, Caesarea, Israel), GE Discovery NM 530c, or NM/CT570c (GE Healthcare, Haifa, Israel) camera system. Patients underwent either symptom-limited exercise testing or pharmacologic stress. Additional details regarding imaging protocols and acquisition have been previously described 28 .
After anonymization, all images were transferred to Cedars-Sinai Medical Center, where quality control was performed by experienced core laboratory technologists without knowledge of the clinical data. Left ventricular (LV) myocardial contours were computed and verified by an experienced nuclear medicine technologist using standard clinical software 30 . Polar maps of the LV, representing compressed form of images, were automatically generated from the images. Five polar maps were derived for each patient including perfusion, motion, thickening, cardiac phase, and amplitude. Clinical data and images from the external centers were de-identified and transferred to Cedars-Sinai. This study complies with the Declaration of Helsinki.

Ethical approvals
The institutional review boards at Cedars-Sinai and the participating sites approved the collection of data for the registry: Design and training of the event-specific deep learning network We employed a deep-learning-based approach capable of learning the distribution of event 'hitting times' directly from data. We extended the DeepHit architecture and associated loss function 31 and implemented with the PyTorch framework 32 . To allow the network to process images, we added convolutional layers capable of directly interrogating perfusion, motion, wall thickening, and phase polar maps and combining the imaging data with 15 clinical features that were chosen based on our previous work on the minimum set of variables for machine learning cardiovascular event prediction 33 . The network consists of two main parts: 1. convolutional part that processes the 28 x 36 x 5 input of 5 normalized polar maps consisting of 2 convolution blocks, each with 3×3 convolution kernels, batch normalization, dropout, and Leaky Rectified Linear Unit (ReLU) layers, which were added to prevent overfitting. 2. clinical features are added in the fully connected layer, with 512 nodes and 15 clinical features passed to a separate fully connected layer with 32 nodes. The output of these layers is concatenated and passed to the DeepHit network described by Lee et al. 31 , with 256 nodes in a single shared layer and 256 nodes in each of the three event-specific layers.
The output of the model is a 2-D 3 × 131 array of shape, representing probabilities of each of the events occurring at time 0 and every 30 days up to the maximum follow up time. We used loss functions proposed by Lee et al 31 . with modification by Kvamme et al 32 .
Missing values (Supplementary Table 5) were imputed using mean or mode (in case of categorical features) values in the training set. This method was previously shown to perform similar to other data imputation techniques 34 .
The important aspect of the architecture is the ability to handle multiple competing events 35 and generate predictions separately for each of them. The resulting architecture is shown in Supplementary Fig. 2. The model generated predictions in the form of a 2-D array of monthly event probabilities for multiple events.

Additional analyses
To evaluate the usefulness of combining clinical and imaging data in a model, we trained and tested a separate model that used clinical features only. This model utilizes the same architecture as the time-to-event model, but without image input. Additionally, we performed a sensitivity analysis in the external testing population to investigate the effect of removing cases with revascularization events within 180 days from the MPI. We compared the AUC for the prediction of revascularization, death, ACS, and MACE in the external dataset at 1 year and 3 years from scan with and without removing the early revascularization cases.
Comparison with Cox regression model For comparison, we created a multivariable Cox regression model that used all clinical features utilized by the time-to-event model and stress total perfusion deficit (perfusion abnormality). This model was trained in the internal set and evaluated in the external testing set. We compared AUC for the prediction of death, ACS, revascularization, and MACE at 1 and 3 years from scan.

Internal Training and testing routine
The model was trained and tested in a 10-fold repeated hold-out regimen. The development set was randomly divided into 10 samples (folds) with the same fraction of each MACE event (stratified split). Then, 10 separate models were trained, each using 9 of 10 folds for training and the remaining one for testing. Within the 90% of training data, the model randomly selected 20% of cases that were used for model hyperparameter optimization in this fold. There was no overlap of training data with the testing data at any point. Testing results from each of the 10 folds and 10 models were concatenated for robust assessment of the overall performance in unseen data.

External testing
The generalizability of the approach to data from new medical centers was evaluated in external testing regimen. For the robust estimation of external performance, each of the 10 models generated in 10-fold cross-validation of REFINE SPECT cohort was evaluated in a separate external cohort from new three new centers. Performance was then evaluated separately for the 10 sets of predictions and presented as mean with 95% confidence intervals after bootstrapping.

Patient-specific explanations
We provide explanations of individualized predictions made by the algorithm. This approach allows for the identification of important patient-specific features driving the prediction and provides a feature importance ranking for each patient, separately for each of the three separate outcomes. The individual explanation of the predicted probability of each of the target events was achieved through the generation of SHapley Additive exPlanations (SHAP values) 36 . To provide a meaningful explanation, we summed the SHAP values for all pixels in each image input and presented them alongside the importance of clinical features in the form of waterfall plots. For each of the top contributing features, the water fall plot visualizes how strongly the given feature increases or decreases the risk for the specific events in a specific patient.

Statistical analysis
Continuous variables were expressed as median and (interquartile ranges [IQR]). Two-sided Kruskal-Wallis test were used to compare differences in median values. Categorical variables were compared using Fisher's exact test. A p value <0.05 was considered statistically significant. Univariable comparisons and summary statistics were computed using R 4.1.2 and R Studio. Details on used software packages and versions are given in the Supplementary Table 6.
The predictive performance of the model was evaluated using time-dependent concordance index 37 that extends the concordance index for time-dependent predictions, and cumulative dynamic area under the receiver-operating characteristic (ROC) curve (cAUC) 38 as implemented in the scikit-survival python package. This measure reflects the probability that, given two randomly chosen patients, one having failed before time <T > and the other having failed after <T > , the prognostic marker will be correctly ranked. We used plots of the cAUC values as a function of time from scan to visualize the ability to capture temporal changes in the risk of adverse events. The 95% confidence limits for the cAUC curves were established using bootstrapping (100 samples with replacement). We compared the cAUC of our model's output with the clinical-only model and with the extent of perfusion abnormality (stress total perfusion deficit) 30 , which is an established, clinically used quantitative MPI variable. In previous studies, total perfusion deficit measure demonstrated efficient risk stratification 39 and identification of patients who may benefit from early revascularization 24 .
Additionally, we evaluated the model using area under the ROC curves (AUC) at 3-time points in the internal testing set (1, 3, and 5 years from scan), separately for all-cause death, ACS, and revascularization at each time point and for each event using DeLong's test.
Compliance with recommendations for machine-learningrelated research This study was designed and conducted following the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) 40 checklist that is included as Supplementary Table 7.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
De-identified data supporting this study may be shared based on reasonable written request to the corresponding author. Access to de-identified data will require a Data Access Agreement and IRB clearance, which will be considered by the institutions who provideded the data for this research.